The search functionality is under construction.

Keyword Search Result

[Keyword] deep learning(149hit)

81-100hit(149hit)

  • Benchmarking Modern Edge Devices for AI Applications

    Pilsung KANG  Jongmin JO  

     
    PAPER-Computer System

      Pubricized:
    2020/12/08
      Vol:
    E104-D No:3
      Page(s):
    394-403

    AI (artificial intelligence) has grown at an overwhelming speed for the last decade, to the extent that it has become one of the mainstream tools that drive the advancements in science and technology. Meanwhile, the paradigm of edge computing has emerged as one of the foremost areas in which applications using the AI technology are being most actively researched, due to its potential benefits and impact on today's widespread networked computing environments. In this paper, we evaluate two major entry-level offerings in the state-of-the-art edge device technology, which highlight increased computing power and specialized hardware support for AI applications. We perform a set of deep learning benchmarks on the devices to measure their performance. By comparing the performance with other GPU (graphics processing unit) accelerated systems in different platforms, we assess the computational capability of the modern edge devices featuring a significant amount of hardware parallelism.

  • Digital Watermarking Method for Printed Matters Using Deep Learning for Detecting Watermarked Areas

    Hiroyuki IMAGAWA  Motoi IWATA  Koichi KISE  

     
    PAPER

      Pubricized:
    2020/10/07
      Vol:
    E104-D No:1
      Page(s):
    34-42

    There are some technologies like QR codes to obtain digital information from printed matters. Digital watermarking is one of such techniques. Compared with other techniques, digital watermarking is suitable for adding information to images without spoiling their design. For such purposes, digital watermarking methods for printed matters using detection markers or image registration techniques for detecting watermarked areas are proposed. However, the detection markers themselves can damage the appearance such that the advantages of digital watermarking, which do not lose design, are not fully utilized. On the other hand, methods using image registration techniques are not able to work for non-registered images. In this paper, we propose a novel digital watermarking method using deep learning for the detection of watermarked areas instead of using detection markers or image registration. The proposed method introduces a semantic segmentation based on deep learning model for detecting watermarked areas from printed matters. We prepare two datasets for training the deep learning model. One is constituted of geometrically transformed non-watermarked and watermarked images. The number of images in this dataset is relatively large because the images can be generated based on image processing. This dataset is used for pre-training. The other is obtained from actually taken photographs including non-watermarked or watermarked printed matters. The number of this dataset is relatively small because taking the photographs requires a lot of effort and time. However, the existence of pre-training allows a fewer training images. This dataset is used for fine-tuning to improve robustness for print-cam attacks. In the experiments, we investigated the performance of our method by implementing it on smartphones. The experimental results show that our method can carry 96 bits of information with watermarked printed matters.

  • Unsupervised Deep Embedded Hashing for Large-Scale Image Retrieval Open Access

    Huanmin WANG  

     
    LETTER-Image

      Pubricized:
    2020/07/14
      Vol:
    E104-A No:1
      Page(s):
    343-346

    Hashing methods have proven to be effective algorithm for image retrieval. However, learning discriminative hash codes is challenging for unsupervised models. In this paper, we propose a novel distinguishable image retrieval framework, named Unsupervised Deep Embedded Hashing (UDEH), to recursively learn discriminative clustering through soft clustering models and generate highly similar binary codes. We reduce the data dimension by auto-encoder and apply binary constraint loss to reduce quantization error. UDEH can be jointly optimized by standard stochastic gradient descent (SGD) in the embedd layer. We conducted a comprehensive experiment on two popular datasets.

  • Multi-Category Image Super-Resolution with Convolutional Neural Network and Multi-Task Learning

    Kazuya URAZOE  Nobutaka KUROKI  Yu KATO  Shinya OHTANI  Tetsuya HIROSE  Masahiro NUMA  

     
    PAPER-Image Processing and Video Processing

      Pubricized:
    2020/10/02
      Vol:
    E104-D No:1
      Page(s):
    183-193

    This paper presents an image super-resolution technique using a convolutional neural network (CNN) and multi-task learning for multiple image categories. The image categories include natural, manga, and text images. Their features differ from each other. However, several CNNs for super-resolution are trained with a single category. If the input image category is different from that of the training images, the performance of super-resolution is degraded. There are two possible solutions to manage multi-categories with conventional CNNs. The first involves the preparation of the CNNs for every category. This solution, however, requires a category classifier to select an appropriate CNN. The second is to learn all categories with a single CNN. In this solution, the CNN cannot optimize its internal behavior for each category. Therefore, this paper presents a super-resolution CNN architecture for multiple image categories. The proposed CNN has two parallel outputs for a high-resolution image and a category label. The main CNN for the high-resolution image is a normal three convolutional layer-architecture, and the sub neural network for the category label is branched out from its middle layer and consists of two fully-connected layers. This architecture can simultaneously learn the high-resolution image and its category using multi-task learning. The category information is used for optimizing the super-resolution. In an applied setting, the proposed CNN can automatically estimate the input image category and change the internal behavior. Experimental results of 2× image magnification have shown that the average peak signal-to-noise ratio for the proposed method is approximately 0.22 dB higher than that for the conventional super-resolution with no difference in processing time and parameters. We have ensured that the proposed method is useful when the input image category is varying.

  • Target-Oriented Deformation of Visual-Semantic Embedding Space

    Takashi MATSUBARA  

     
    PAPER

      Pubricized:
    2020/09/24
      Vol:
    E104-D No:1
      Page(s):
    24-33

    Multimodal embedding is a crucial research topic for cross-modal understanding, data mining, and translation. Many studies have attempted to extract representations from given entities and align them in a shared embedding space. However, because entities in different modalities exhibit different abstraction levels and modality-specific information, it is insufficient to embed related entities close to each other. In this study, we propose the Target-Oriented Deformation Network (TOD-Net), a novel module that continuously deforms the embedding space into a new space under a given condition, thereby providing conditional similarities between entities. Unlike methods based on cross-modal attention applied to words and cropped images, TOD-Net is a post-process applied to the embedding space learned by existing embedding systems and improves their performances of retrieval. In particular, when combined with cutting-edge models, TOD-Net gains the state-of-the-art image-caption retrieval model associated with the MS COCO and Flickr30k datasets. Qualitative analysis reveals that TOD-Net successfully emphasizes entity-specific concepts and retrieves diverse targets via handling higher levels of diversity than existing models.

  • A Two-Stage Approach for Fine-Grained Visual Recognition via Confidence Ranking and Fusion

    Kangbo SUN  Jie ZHU  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2020/09/11
      Vol:
    E103-D No:12
      Page(s):
    2693-2700

    Location and feature representation of object's parts play key roles in fine-grained visual recognition. To promote the final recognition accuracy without any bounding boxes/part annotations, many studies adopt object location networks to propose bounding boxes/part annotations with only category labels, and then crop the images into partial images to help the classification network make the final decision. In our work, to propose more informative partial images and effectively extract discriminative features from the original and partial images, we propose a two-stage approach that can fuse the original features and partial features by evaluating and ranking the information of partial images. Experimental results show that our proposed approach achieves excellent performance on two benchmark datasets, which demonstrates its effectiveness.

  • Battery-Powered Wild Animal Detection Nodes with Deep Learning

    Hiroshi SAITO  Tatsuki OTAKE  Hayato KATO  Masayuki TOKUTAKE  Shogo SEMBA  Yoichi TOMIOKA  Yukihide KOHIRA  

     
    PAPER

      Pubricized:
    2020/07/01
      Vol:
    E103-B No:12
      Page(s):
    1394-1402

    Since wild animals are causing more accidents and damages, it is important to safely detect them as early as possible. In this paper, we propose two battery-powered wild animal detection nodes based on deep learning that can automatically detect wild animals; the detection information is notified to the people concerned immediately. To use the proposed nodes outdoors where power is not available, we devise power saving techniques for the proposed nodes. For example, deep learning is used to save power by avoiding operations when wild animals are not detected. We evaluate the operation time and the power consumption of the proposed nodes. Then, we evaluate the energy consumption of the proposed nodes. Also, we evaluate the detection range of the proposed nodes, the accuracy of deep learning, and the success rate of communication through field tests to demonstrate that the proposed nodes can be used to detect wild animals outdoors.

  • An Efficient Method for Training Deep Learning Networks Distributed

    Chenxu WANG  Yutong LU  Zhiguang CHEN  Junnan LI  

     
    PAPER-Fundamentals of Information Systems

      Pubricized:
    2020/09/07
      Vol:
    E103-D No:12
      Page(s):
    2444-2456

    Training deep learning (DL) is a computationally intensive process; as a result, training time can become so long that it impedes the development of DL. High performance computing clusters, especially supercomputers, are equipped with a large amount of computing resources, storage resources, and efficient interconnection ability, which can train DL networks better and faster. In this paper, we propose a method to train DL networks distributed with high efficiency. First, we propose a hierarchical synchronous Stochastic Gradient Descent (SGD) strategy, which can make full use of hardware resources and greatly increase computational efficiency. Second, we present a two-level parameter synchronization scheme which can reduce communication overhead by transmitting parameters of the first layer models in shared memory. Third, we optimize the parallel I/O by making each reader read data as continuously as possible to avoid the high overhead of discontinuous data reading. At last, we integrate the LARS algorithm into our system. The experimental results demonstrate that our approach has tremendous performance advantages relative to unoptimized methods. Compared with the native distributed strategy, our hierarchical synchronous SGD strategy (HSGD) can increase computing efficiency by about 20 times.

  • Fundamental Trial on DOA Estimation with Deep Learning Open Access

    Yuya KASE  Toshihiko NISHIMURA  Takeo OHGANE  Yasutaka OGAWA  Daisuke KITAYAMA  Yoshihisa KISHIYAMA  

     
    PAPER-Antennas and Propagation

      Pubricized:
    2020/04/21
      Vol:
    E103-B No:10
      Page(s):
    1127-1135

    Direction of arrival (DOA) estimation of wireless signals has a long history but is still being investigated to improve the estimation accuracy. Non-linear algorithms such as compressed sensing are now applied to DOA estimation and achieve very high performance. If the large computational loads of compressed sensing algorithms are acceptable, it may be possible to apply a deep neural network (DNN) to DOA estimation. In this paper, we verify on-grid DOA estimation capability of the DNN under a simple estimation situation and discuss the effect of training data on DNN design. Simulations show that SNR of the training data strongly affects the performance and that the random SNR data is suitable for configuring the general-purpose DNN. The obtained DNN provides reasonably high performance, and it is shown that the DNN trained using the training data restricted to close DOA situations provides very high performance for the close DOA cases.

  • Weight Compression MAC Accelerator for Effective Inference of Deep Learning Open Access

    Asuka MAKI  Daisuke MIYASHITA  Shinichi SASAKI  Kengo NAKATA  Fumihiko TACHIBANA  Tomoya SUZUKI  Jun DEGUCHI  Ryuichi FUJIMOTO  

     
    PAPER-Integrated Electronics

      Pubricized:
    2020/05/15
      Vol:
    E103-C No:10
      Page(s):
    514-523

    Many studies of deep neural networks have reported inference accelerators for improved energy efficiency. We propose methods for further improving energy efficiency while maintaining recognition accuracy, which were developed by the co-design of a filter-by-filter quantization scheme with variable bit precision and a hardware architecture that fully supports it. Filter-wise quantization reduces the average bit precision of weights, so execution times and energy consumption for inference are reduced in proportion to the total number of computations multiplied by the average bit precision of weights. The hardware utilization is also improved by a bit-parallel architecture suitable for granularly quantized bit precision of weights. We implement the proposed architecture on an FPGA and demonstrate that the execution cycles are reduced to 1/5.3 for ResNet-50 on ImageNet in comparison with a conventional method, while maintaining recognition accuracy.

  • Efficient Salient Object Detection Model with Dilated Convolutional Networks

    Fei GUO  Yuan YANG  Yong GAO  Ningmei YU  

     
    PAPER-Image Recognition, Computer Vision

      Pubricized:
    2020/07/17
      Vol:
    E103-D No:10
      Page(s):
    2199-2207

    Introduction of Fully Convolutional Networks (FCNs) has made record progress in salient object detection models. However, in order to retain the input resolutions, deconvolutional networks with unpooling are applied on top of FCNs. This will cause the increase of the computation and network model size in segmentation task. In addition, most deep learning based methods always discard effective saliency prior knowledge completely, which are shown effective. Therefore, an efficient salient object detection method based on deep learning is proposed in our work. In this model, dilated convolutions are exploited in the networks to produce the output with high resolution without pooling and adding deconvolutional networks. In this way, the parameters and depth of the network are decreased sharply compared with the traditional FCNs. Furthermore, manifold ranking model is explored for the saliency refinement to keep the spatial consistency and contour preserving. Experimental results verify that performance of our method is superior with other state-of-art methods. Meanwhile, the proposed model occupies the less model size and fastest processing speed, which is more suitable for the wearable processing systems.

  • Sentence-Embedding and Similarity via Hybrid Bidirectional-LSTM and CNN Utilizing Weighted-Pooling Attention

    Degen HUANG  Anil AHMED  Syed Yasser ARAFAT  Khawaja Iftekhar RASHID  Qasim ABBAS  Fuji REN  

     
    PAPER-Natural Language Processing

      Pubricized:
    2020/08/27
      Vol:
    E103-D No:10
      Page(s):
    2216-2227

    Neural networks have received considerable attention in sentence similarity measuring systems due to their efficiency in dealing with semantic composition. However, existing neural network methods are not sufficiently effective in capturing the most significant semantic information buried in an input. To address this problem, a novel weighted-pooling attention layer is proposed to retain the most remarkable attention vector. It has already been established that long short-term memory and a convolution neural network have a strong ability to accumulate enriched patterns of whole sentence semantic representation. First, a sentence representation is generated by employing a siamese structure based on bidirectional long short-term memory and a convolutional neural network. Subsequently, a weighted-pooling attention layer is applied to obtain an attention vector. Finally, the attention vector pair information is leveraged to calculate the score of sentence similarity. An amalgamation of both, bidirectional long short-term memory and a convolutional neural network has resulted in a model that enhances information extracting and learning capacity. Investigations show that the proposed method outperforms the state-of-the-art approaches to datasets for two tasks, namely semantic relatedness and Microsoft research paraphrase identification. The new model improves the learning capability and also boosts the similarity accuracy as well.

  • Deep Learning Approaches for Pathological Voice Detection Using Heterogeneous Parameters

    JiYeoun LEE  Hee-Jin CHOI  

     
    LETTER-Speech and Hearing

      Pubricized:
    2020/05/14
      Vol:
    E103-D No:8
      Page(s):
    1920-1923

    We propose a deep learning-based model for classifying pathological voices using a convolutional neural network and a feedforward neural network. The model uses combinations of heterogeneous parameters, including mel-frequency cepstral coefficients, linear predictive cepstral coefficients and higher-order statistics. We validate the accuracy of this model using the Massachusetts Eye and Ear Infirmary (MEEI) voice disorder database and the Saarbruecken Voice Database (SVD). Our model achieved an accuracy of 99.3% for MEEI and 75.18% for SVD. This model achieved an accuracy that is 7.18% higher than that of competitive models in previous studies.

  • Improving Faster R-CNN Framework for Multiscale Chinese Character Detection and Localization

    Minseong KIM  Hyun-Chul CHOI  

     
    LETTER-Pattern Recognition

      Pubricized:
    2020/04/06
      Vol:
    E103-D No:7
      Page(s):
    1777-1781

    Faster R-CNN uses a region proposal network which consists of a single scale convolution filter and fully connected networks to localize detected regions. However, using a single scale filter is not enough to detect full regions of characters. In this letter, we propose a simple but effective way, i.e., utilizing variously sized convolution filters, to accurately detect Chinese characters of multiple scales in documents. We experimentally verified that our method improved IoU by 4% and detection rate by 3% than the previous single scale Faster R-CNN method.

  • A Two-Stage Phase-Aware Approach for Monaural Multi-Talker Speech Separation

    Lu YIN  Junfeng LI  Yonghong YAN  Masato AKAGI  

     
    PAPER-Speech and Hearing

      Pubricized:
    2020/04/20
      Vol:
    E103-D No:7
      Page(s):
    1732-1743

    The simultaneous utterances impact the ability of both the hearing-impaired persons and automatic speech recognition systems. Recently, deep neural networks have dramatically improved the speech separation performance. However, most previous works only estimate the speech magnitude and use the mixture phase for speech reconstruction. The use of the mixture phase has become a critical limitation for separation performance. This study proposes a two-stage phase-aware approach for multi-talker speech separation, which integrally recovers the magnitude as well as the phase. For the phase recovery, Multiple Input Spectrogram Inversion (MISI) algorithm is utilized due to its effectiveness and simplicity. The study implements the MISI algorithm based on the mask and gives that the ideal amplitude mask (IAM) is the optimal mask for the mask-based MISI phase recovery, which brings less phase distortion. To compensate for the error of phase recovery and minimize the signal distortion, an advanced mask is proposed for the magnitude estimation. The IAM and the proposed mask are estimated at different stages to recover the phase and the magnitude, respectively. Two frameworks of neural network are evaluated for the magnitude estimation on the second stage, demonstrating the effectiveness and flexibility of the proposed approach. The experimental results demonstrate that the proposed approach significantly minimizes the distortions of the separated speech.

  • Improvement of Luminance Isotropy for Convolutional Neural Networks-Based Image Super-Resolution

    Kazuya URAZOE  Nobutaka KUROKI  Yu KATO  Shinya OHTANI  Tetsuya HIROSE  Masahiro NUMA  

     
    LETTER-Image

      Vol:
    E103-A No:7
      Page(s):
    955-958

    Convolutional neural network (CNN)-based image super-resolutions are widely used as a high-quality image-enhancement technique. However, in general, they show little to no luminance isotropy. Thus, we propose two methods, “Luminance Inversion Training (LIT)” and “Luminance Inversion Averaging (LIA),” to improve the luminance isotropy of CNN-based image super-resolutions. Experimental results of 2× image magnification show that the average peak signal-to-noise ratio (PSNR) using Luminance Inversion Averaging is about 0.15-0.20dB higher than that for the conventional super-resolution.

  • Intrusion Detection System Using Deep Learning and Its Application to Wi-Fi Network

    Kwangjo KIM  

     
    INVITED PAPER

      Pubricized:
    2020/03/31
      Vol:
    E103-D No:7
      Page(s):
    1433-1447

    Deep learning is gaining more and more lots of attractions and better performance in implementing the Intrusion Detection System (IDS), especially for feature learning. This paper presents the state-of-the-art advances and challenges in IDS using deep learning models, which have been achieved the big performance enhancements in the field of computer vision, natural language processing, and image/audio processing than the traditional methods. After providing a systematic and methodical description of the latest developments in deep learning from the points of the deployed architectures and techniques, we suggest the pros-and-cons of all the deep learning-based IDS, and discuss the importance of deep learning models as feature learning approach. For this, the author has suggested the concept of the Deep-Feature Extraction and Selection (D-FES). By combining the stacked feature extraction and the weighted feature selection for D-FES, our experiment was verified to get the best performance of detection rate, 99.918% and false alarm rate, 0.012% to detect the impersonation attacks in Wi-Fi network which can be achieved better than the previous publications. Summary and further challenges are suggested as a concluding remark.

  • Deep State-Space Model for Noise Tolerant Skeleton-Based Action Recognition

    Kazuki KAWAMURA  Takashi MATSUBARA  Kuniaki UEHARA  

     
    PAPER

      Pubricized:
    2020/03/18
      Vol:
    E103-D No:6
      Page(s):
    1217-1225

    Action recognition using skeleton data (3D coordinates of human joints) is an attractive topic due to its robustness to the actor's appearance, camera's viewpoint, illumination, and other environmental conditions. However, skeleton data must be measured by a depth sensor or extracted from video data using an estimation algorithm, and doing so risks extraction errors and noise. In this work, for robust skeleton-based action recognition, we propose a deep state-space model (DSSM). The DSSM is a deep generative model of the underlying dynamics of an observable sequence. We applied the proposed DSSM to skeleton data, and the results demonstrate that it improves the classification performance of a baseline method. Moreover, we confirm that feature extraction with the proposed DSSM renders subsequent classifications robust to noise and missing values. In such experimental settings, the proposed DSSM outperforms a state-of-the-art method.

  • Orthogonal Gradient Penalty for Fast Training of Wasserstein GAN Based Multi-Task Autoencoder toward Robust Speech Recognition

    Chao-Yuan KAO  Sangwook PARK  Alzahra BADI  David K. HAN  Hanseok KO  

     
    LETTER-Speech and Hearing

      Pubricized:
    2020/01/27
      Vol:
    E103-D No:5
      Page(s):
    1195-1198

    Performance in Automatic Speech Recognition (ASR) degrades dramatically in noisy environments. To alleviate this problem, a variety of deep networks based on convolutional neural networks and recurrent neural networks were proposed by applying L1 or L2 loss. In this Letter, we propose a new orthogonal gradient penalty (OGP) method for Wasserstein Generative Adversarial Networks (WGAN) applied to denoising and despeeching models. WGAN integrates a multi-task autoencoder which estimates not only speech features but also noise features from noisy speech. While achieving 14.1% improvement in Wasserstein distance convergence rate, the proposed OGP enhanced features are tested in ASR and achieve 9.7%, 8.6%, 6.2%, and 4.8% WER improvements over DDAE, MTAE, R-CED(CNN) and RNN models.

  • Vehicle Key Information Detection Algorithm Based on Improved SSD

    Ende WANG  Yong LI  Yuebin WANG  Peng WANG  Jinlei JIAO  Xiaosheng YU  

     
    PAPER-Intelligent Transport System

      Vol:
    E103-A No:5
      Page(s):
    769-779

    With the rapid development of technology and economy, the number of cars is increasing rapidly, which brings a series of traffic problems. To solve these traffic problems, the development of intelligent transportation systems are accelerated in many cities. While vehicles and their detailed information detection are great significance to the development of urban intelligent transportation system, the traditional vehicle detection algorithm is not satisfactory in the case of complex environment and high real-time requirement. The vehicle detection algorithm based on motion information is unable to detect the stationary vehicles in video. At present, the application of deep learning method in the task of target detection effectively improves the existing problems in traditional algorithms. However, there are few dataset for vehicles detailed information, i.e. driver, car inspection sign, copilot, plate and vehicle object, which are key information for intelligent transportation. This paper constructs a deep learning dataset containing 10,000 representative images about vehicles and their key information detection. Then, the SSD (Single Shot MultiBox Detector) target detection algorithm is improved and the improved algorithm is applied to the video surveillance system. The detection accuracy of small targets is improved by adding deconvolution modules to the detection network. The experimental results show that the proposed method can detect the vehicle, driver, car inspection sign, copilot and plate, which are vehicle key information, at the same time, and the improved algorithm in this paper has achieved better results in the accuracy and real-time performance of video surveillance than the SSD algorithm.

81-100hit(149hit)